The effective application of contrastive learning technology in natural language processing tasks shows the superiority of contrastive learning in text analysis tasks. How to construct positive and negative samples correctly and reasonably is the core challenge of contrastive learning. Since it is difficult to construct contrastive objects in multi-label multi-classification tasks, there are few contrastive losses for multi-label multi-classification text classification. In this paper, we propose five contrastive losses for multi-label multi-classification tasks. They are Strict Contrastive Loss (SCL), Intra-label Contrastive Loss (ICL), Jaccard Similarity Contrastive Loss (JSCL), and Jaccard Similarity Probability Contrastive Loss (JSPCL) and Stepwise Label Contrastive Loss (SLCL). We explore the effectiveness of contrastive learning for multi-label multi-classification tasks under different strategies, and provide a set of baseline methods for contrastive learning techniques on multi-label classification tasks. We also perform an interpretability analysis of our approach to show how different contrastive learning methods play their roles. The experimental results in this paper demonstrate that our proposed contrastive losses can bring some improvement for multi-label multi-classification tasks. Our work reveal how to "appropriately" change the contrastive way of contrastive learning is the key idea to improve the adaptability of contrastive learning in multi-label multi-classification tasks.
translated by 谷歌翻译
学习模当融合的表示和处理未对准的多模式序列在多式联情绪识别中是有意义的,具有挑战性。现有方法使用定向成对注意力或消息中心到熔丝语言,视觉和音频模态。然而,这些方法在融合特征时介绍信息冗余,并且在不考虑方式的互补性的情况下效率低效。在本文中,我们提出了一种高效的神经网络,以学习与CB变压器(LMR-CBT)的模型融合表示,用于从未对准的多模式序列进行多峰情绪识别。具体地,我们首先为三种方式执行特征提取,以获得序列的局部结构。然后,我们设计具有跨模块块(CB变压器)的新型变压器,其能够实现不同模式的互补学习,主要分为局部时间学习,跨模型特征融合和全球自我关注表示。此外,我们将融合功能与原始特征拼接以对序列的情绪进行分类。最后,我们在三个具有挑战性的数据集,IEMocap,CMU-MOSI和CMU-MOSEI进行词语对齐和未对准的实验。实验结果表明我们在两个设置中提出的方法的优势和效率。与主流方法相比,我们的方法以最小数量的参数达到最先进的。
translated by 谷歌翻译
基于音频视频的多模式情绪识别由于其强大的性能引起了很多人。大多数现有方法都侧重于提出不同的跨模态融合策略。然而,这些策略在不同模式的特征中引入了冗余,而无需完全考虑模态信息之间的互补特性,并且这些方法不保证在跨跨和间间交互期间的原始语义信息的非损失。在本文中,我们提出了一种基于自我关注和残余结构(CFN-SR)的新型跨模型融合网络,用于多式联情绪识别。首先,我们对音频和视频模型执行表示学习,以通过有效的ResNext和1D CNN获得两个模态的语义特征。其次,我们将两个模态的特征分别馈送到跨模块块中,以确保通过自我关注机制和残余结构来确保信息的有效互补性和完整性。最后,我们通过用原始表示拼接获得的融合表示来获得情绪的产出。为了验证所提出的方法的有效性,我们对Ravdess数据集进行实验。实验结果表明,拟议的CFN-SR实现了最先进的,并以26.30M参数获得75.76%的精度。我们的代码可在https://github.com/skeletonnn/cfn-sr获得。
translated by 谷歌翻译
Although weakly-supervised techniques can reduce the labeling effort, it is unclear whether a saliency model trained with weakly-supervised data (e.g., point annotation) can achieve the equivalent performance of its fully-supervised version. This paper attempts to answer this unexplored question by proving a hypothesis: there is a point-labeled dataset where saliency models trained on it can achieve equivalent performance when trained on the densely annotated dataset. To prove this conjecture, we proposed a novel yet effective adversarial trajectory-ensemble active learning (ATAL). Our contributions are three-fold: 1) Our proposed adversarial attack triggering uncertainty can conquer the overconfidence of existing active learning methods and accurately locate these uncertain pixels. {2)} Our proposed trajectory-ensemble uncertainty estimation method maintains the advantages of the ensemble networks while significantly reducing the computational cost. {3)} Our proposed relationship-aware diversity sampling algorithm can conquer oversampling while boosting performance. Experimental results show that our ATAL can find such a point-labeled dataset, where a saliency model trained on it obtained $97\%$ -- $99\%$ performance of its fully-supervised version with only ten annotated points per image.
translated by 谷歌翻译
作为自然语言处理领域(NLP)领域的广泛研究,基于方面的情感分析(ABSA)是预测文本中相对于相应方面所表达的情感的任务。不幸的是,大多数语言缺乏足够的注释资源,因此越来越多的研究人员专注于跨语义方面的情感分析(XABSA)。但是,最近的研究仅集中于跨语性数据对准而不是模型对齐。为此,我们提出了一个新颖的框架CL-XABSA:基于跨语言的情感分析的对比度学习。基于对比度学习,我们在不同的语义空间中关闭具有相同标签的样品之间的距离,从而实现了不同语言的语义空间的收敛。具体而言,我们设计了两种对比策略,即代币嵌入(TL-CTE)和情感水平的对比度学习,对代币嵌入(SL-CTE)的对比度学习,以使源语言和目标语言的语义空间正规化,以使其更加统一。由于我们的框架可以在培训期间以多种语言接收数据集,因此我们的框架不仅可以适应XABSA任务,而且可以针对基于多语言的情感分析(MABSA)进行调整。为了进一步提高模型的性能,我们执行知识蒸馏技术利用未标记的目标语言的数据。在蒸馏XABSA任务中,我们进一步探讨了不同数据(源数据集,翻译数据集和代码切换数据集)的比较有效性。结果表明,所提出的方法在XABSA,蒸馏XABSA和MABSA的三个任务中具有一定的改进。为了获得可重复性,我们的本文代码可在https://github.com/gklmip/cl-xabsa上获得。
translated by 谷歌翻译
由于深入学习技术的快速进展和大型培训集的广泛可用性,视频显着性检测模型的性能一直在稳定地改善。然而,基于深度学习的VisualAudio固定预测仍处于起步阶段。目前,只提供了一些视觉音频序列,实际固定在真实的视觉音频环境中记录。因此,在相同的视觉音频环境下回忆真实固定,它既不有效也不是必要的。为了解决这个问题,本文以弱策略的方式促进一种新的方法,以减轻对视觉音频模型培训的大规模培训集的需求。仅使用视频类别标签,我们提出了选择性类激活映射(SCAM)及其升级(诈骗+)。在空间 - 时间 - 音频环境中,前者遵循粗致细的策略来选择最辨别的区域,并且这些区域通常能够与真正的人眼固定表现出高一致性。后者用额外的多粒度感知机制配备了骗局,使整个过程更加符合真正的人类视觉系统。此外,我们从这些区域蒸馏出知识,以获得完整的新空间 - 音频(STA)固定预测(FP)网络,在视频标签不可用的情况下实现广泛的应用。不借助任何真正的人眼固定,这些STA FP网络的性能与完全监督网络的性能相当。代码和结果在https://github.com/guotaowang/stanet上公开使用。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译